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SUMMARY 


Sequencing of part of a clone from a transmissible gastroenteritis virus genome 
cDNA library led to the identification of the gene encoding the E1 matrix protein. The 
amino acid sequence of the primary translation product predicts a polypeptide of 262 
residues which shares many features with the previously characterized murine 
hepatitis virus and infectious bronchitis virus El proteins. However, N-terminal amino 
acid sequencing revealed that a putative signal peptide of 17 residues was absent in the 
virion-associated polypeptide. The predicted mol. wt. of the mature unglycosylated 
product, 27800, is in agreement with the experimental M, value. 


INTRODUCTION 


Transmissible gastroenteritis (TGE) is a highly contagious disease of pigs causing high 
mortality in neonates. The causal agent (TGE virus, TGEV) belongs to the family 
Coronaviridae, a group of enveloped viruses with a large, positive-stranded RNA genome (for 
review, see Siddell et a/., 1983). Coronavirus-encoded information is expressed in the cell 
through a nested set of subgenomic mRNAs with common 3’-terminal sequences. The coding 
part of each mRNA corresponds approximately to the 5’-terminal sequences that are absent in 
the next smaller species. Mouse hepatitis virus (MHV) and infectious bronchitis virus (IBV) 
mRNAs contain at the 5’ end a short non-coding sequence joined to the body sequences by 
discontinuous transcription; a consensus sequence identified at each intergenic region may act 
as a binding site for the RNA polymerase-leader complex (Spaan et al., 1983; Brown et al., 
1984; Lai et al., 1984; Budzilowicz et al., 1985). 

TGEV contains three major structural polypeptides: the peplomer glycoprotein E2 (200K to 
220K), which forms the distinctive surface projections, the transmembrane or matrix protein El 
(29K + 1K), and the nucleoprotein N (47K + 1K), in which a single infectious RNA molecule 
of 20 kb or more is embedded (Garwes et al., 1976; Laude et al., 1986; Brian et a/., 1980). In 
TGEV-infected cells, five species of subgenomic mRNA have been characterized, in vitro 
translation of which has allowed partial coding assignment (Hu et al., 1984; Jacobs et al., 1986; 
Rasschaert et al., 1987). 

The matrix protein has been the subject of intensive studies in other coronaviruses. The N- 
terminal regions of El proteins of MHV and bovine coronavirus (BCV) bear sugar chains O- 
linked to Ser or Thr residues (Niemann & Klenk, 1981; Niemann et al., 1984), an unusual 
feature among viral glycoproteins. Also, unlike the majority of integral membrane proteins, 
both MHV and IBV EI proteins lack a cleaved signal peptide (Rottier er a/., 1984; Stern & 
Sefton, 19824). The restriction of El to internal membranes seems to determine the assembly of 
the coronavirus particles in the lumen of the endoplasmic reticulum (Holmes et al., 1981; Tooze 
et al., 1985). A model of the membrane topology of E1 has recently emerged from a combination 
of biochemical data and analysis of its primary structure (Armstrong et al., 1984; Rottier et al., 
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1986). The amino half is formed of a transmembrane helix with two hairpin structures, whereas 
the carboxy half is closely adjacent to the inner surface of the viral envelope and is largely 
resistant to proteolysis. At each end, two short hydrophilic segments are assumed to project from 
either side of the membrane. 

In an attempt to elucidate the functional domains of the TGEV surface glycoproteins, a 
cDNA library of the TGEV genome has been created in this laboratory. In this paper, we report 
the sequence of part of a clone encoding the E1 gene, present information about the N-terminal 
processing of its product and compare the features predicted from its primary structure with 
those of the other E] coronavirus proteins. 


METHODS 

cDNA cloning. The preparation of cDNA clones will be described in more detail in a subsequent 
communication. cDNA was obtained from purified genomic RNA (TGEV Purdue-115 strain) using 
oligo-(T); 13 (Pharmacia) as primer and reverse transcriptase (Stehelin, Basel, Switzerland). RNase T2-treated 
cDNA-RNA hybrids were dC-tailed and inserted in PstI-cut dG-tailed pBR322 (Bethesda Research 
Laboratories) (Zain et al., 1979; Van der Werf et al., 1981). Escherichia coli RR1 cells were transfected with this 
material. An insert (5 kb) covering the complete El coding region was located in the clone pTG2.15. 

DNA sequencing. Sonicated fragments of pTG2.15 were subcloned into Smal-cut M13mp18 vector (Deininger, 
1983). Sequencing was performed by Sanger’s dideoxy technique using [3°S]dATP (New England Nuclear) as the 
label and the reaction products were analysed in buffer gradient gels. Each strand of DNA was sequenced at least 
three times. 

Sequence analysis. Sequences were analysed using the program of Queen & Korn (1984), marketed as part of the 
Microgenie program (March 1985 version, Beckman), developed for the IBM PC-XT microcomputer. The 
program, utilizing the hydrophilicity values of Hopp & Woods (1981), the mean fractional area exposed values of 
Rose et al. (1985), and the flexibility values (predicted Bnorm. data) of Karplus & Schulz (1985) was written in 
Apple Basic (F. Borras-Cuesta & H. Laude, unpublished data). 

Isolation of the E1 polypeptide and partial amino acid analysis. Purified virion polypeptides were resolved by SDS- 
PAGE as described by Laude et al. (1986). After localization by gel slice staining, the protein band was excised 
from the gel and placed in an electroelution chamber (Isco). The protein was eluted for 24 h in 50 mm-NaHCO, + 
0:1% SDS. The reservoir buffer was subsequently changed to 10 mm-NaHCO, + 0-01% SDS and electrodialysis 
was performed overnight (Hunkapiller et a/., 1983). About 100 pmol of protein was subjected to N-terminal amino 
acid sequencing. Sequential Edman degradation was done on a ‘gas phase’ Applied Biosystems 470A apparatus 
with its dedicated on-line PTH amino acid analyser 120A. 


RESULTS 


A long open reading frame (ORF) of 867 bases yielding a protein with the properties of El, 
was identified on clone pTG2.15 of TGEV cDNA. The 5’ end of this ORF mapped at 2-48 kb 
from the 3’ end of the genome (Rasschaert et al., 1987). According to its length and position, the 
El ORF corresponded to the ‘unique’ region of the mRNA 5 within the set of viral RNAs 
characterized by Northern blot analysis (data not shown). 

Inspection of the nucleotide sequence displayed in Fig. 1 revealed the presence, near each 
extremity, of two AACTAAAC sequences, which were assumed to be the start of the mRNA 
transcripts 5 (El-encoding) and 6 (N-encoding) respectively. Therefore, although the ORF 
extended 23 codons upstream from the first consensus sequence, it was postulated that initiation 
of translation on MRNA 5 should occur downstream at either of the two proximal ATGs 
available. The ATG adjacent to the consensus sequence is followed by a characteristically 
hydrophobic stretch of amino acids, which may possibly act as a signal peptide for the 
translocation of El. Alternatively, translation might start at the next ATG codon (position 184), 
yielding a product of 241 residues, still slightly larger in size than the matrix proteins of other 
coronaviruses. However, the second ATG lies in a less favourable context than the first for 
initiation of translation (Kozak, 1983). 

Partial microsequencing of the mature El polypeptide was performed to confirm the site of 
translation initiation (Fig. 2). The N-terminal residues thus identified were found in perfect 
agreement with the predicted sequence up to position 14, beyond which the sequencing process 
was perturbed. This led us to conclude that: (i) translation of El cannot be initiated at the second 
ATG codon (position 184) and (ii) a hydrophobic peptide specified by the first 17 codons of the 
gene was not present in virion-associated E1. 
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% 40 90 
BCA ATT TAG GAA GSA CAG TTA TTA TTS TTC CAG CBC AAC ATE CTT ACG ATE CCT ATA AGA ATT TTA TEC GAA TTA AAG CAT ACA ACC CCE 
Ala Ie End Glu Gly Gln Leu Leu Leu Phe Gln Arg Asn Met Leu Thr Net Pro [le Arg [le Leu Cys Glu Leu Lys His The Thr Pro 


120 150 180 
ATG GAG CAC TEC TTB CTT aha CTA AAC ARR ATE ATG AAG ATT TTG TTA ATA TTA GCG TET GTG ATT GCA TEC SCA TET BGA GAA CEC TAT TET 
Wet Glu His Ser Leu Leu Glu Lew Asn Lys Ret Lys [le Leu Leu Ile Leu Ala Cys Val [le Ala Cys Ala Cys Gly Glu Arg Tyr Cys 


210 240 270 
SCT ATE AAA TCC GAT ACA GAT TTE TCA TET CSC AAT AGT ACA GCG TCT GAT TET GAG TCA TGC TIC AAC GGA GEC GAT CTT ATT TSS CAT 
Ala Met Lys Ser Asp Thr Asp Leu Ser Cys Arg Asn Ser Thr Ala Ser Asp Cys Glu Ser Cys Phe Asn Gly Gly Asp Leu Ile Trp His 


300 330 340 
CTT GCA AAC TRG AAC TTC AGC TGG TCT ATA ATA TTG ATC GTT TTT ATA ACT GTG CTA CAA TAT GGA AGA CCT CAA TTC AGC TGG TTC aTG 
Leu Ala Asn Trp fin Phe Ser Trp Ser Ile lle Leu Ile Val Phe Ile The Yal Leu Sin Tyr Sly Arg Pra Bln Phe Ger Trp Phe Val 


390 420 450 
TAT BBC ATT AAA ATG CTT ATA ATG TGE CTA TTA TSG CCC GTT GTT TTS GCT CTT ACG ATT TIT AAT GCA TAC TCG BAA TAC CAA STS TCC 
Tye Bly Ue Lys Met Leu [Je Met Trp Leu Leu Trp Pro Val Val Leu Ala Leu Thr Ile Phe Asn Ala Tyr Ser Glu Tyr Gln Val Ser 


480 516 540 
AGA TAT GTA ATG TTC GBC TTT AGT ATT GCA SGT GCA ATT GTT ACA TTT GTA CTC TEE ATT ATG TAT TTT GTA AGA TCC ATT CAG TTG TAC 
Arg Tyr Val Met Phe Gly Phe Ser Ile Ala Sly Ala [le Val Thr Phe Yal Leu Trp Ile Met Tyr Phe Val Arg Ser Ile Gln Leu Tyr 


570 600 630 
AGA AGG ACT AAC TCT Tag TEG TCT TTC AAC CCT GAA ACT AAA GCA ATT CTT TGC GIT AGT GCA TTA GGA AGA AGC TAT 878 CTT CCT CTC 
Arg Arg Thr Asn Ser Trp Trp Ser Phe Asn Pro Glu The Lys Ala Ile Leu Cys Val Ser Ala Leu Gly Arg Ser Tyr Val Leu Pro Leu 


660 690 720 
SAA GBT BTE CCA ACT GET GTC ACT CTA ACT TTG CTT TCA GGG AAT TTG TAC GCT GAA GGG TIC AAA ATT GCA GAT GET ATG AAC ATC GAC 
Stu Gly Val Pro Thr Gly Val Thr Leu Thr Leu Leu Ser Gly Asn Leu Tyr Ala Glu Gly Phe Lys Ile Ala Asp Gly Met Asn Tle Asp 


750 780 Bi0 
AAT TTA CCA AAA TAC GTA ATG GTT BCA TTA CCT AGC AGE ACT ATT GTC TAC ACA CTT BTT GEC AAG AAG TTG AAA GCA AGT AGT GCE ACT 
Asn Leu Pro Lys Tyr Val Met Val Ala Leu Pro Ser Arg Thr Ile Val Tyr Thr Leu Val Gly Lys Lys Leu Lys Ala Ser Ser Ala Thr 


add 870 900 

GBA TBG GCT TAC TAT STA AAA TCT AAA GCT S&T GAT TAC TCA ACA GAG GCA AGA ACT GAT AAT TTG AGT GAG CAA GAA AAA TTA TTA CAT 

Gly Trp Ala Tyr Tyr Val Lys Ser Lys Ala Sly Asp Tyr Ser Thr Glu Ala Arg Thr Asp Asn Leu Ser Glu Gln Glu Lys Leu Leu His 
e 


930 
ATG GTA 7) AA CTA AAC Thc TAA ATG GCC AAC CAG GEA CAA CET GTC AGT TEG 66A GAT 
Net Val Het Ala Asn Gin Sly Gln Arg Val Ser Trp Gly Asp 


Fig. 1. Part of the sequence (957 bases) from the TGEV genomic cDNA clone pTG2.15. The main 
ORF, encoding the El protein, and part of the ORF corresponding to the adjacent nucleocapsid gene 
(broken line) are translated in the three letter amino acid code. The two consensus sequences are boxed. 
Proximal ATG codons are underlined, stop codons are overlined. Dots beneath Asn residues indicate 
N-glycosylation signals (Asn-X-Ser or Asn-X-Thr). The signal peptide-like sequence is underlined. 


(a) ~17 -10 1 10 20 30 40 


RYCAMKSDTDLScRuuuaASs 


MKILLILACVIACACCERY CAMKSDTDLSCRNSTASDCESCFNGGDLIWHLANWNFSWST 
ee co pad 


. } * 
ATG ATG 
121 184 


(b) Amino acid Arg Tyr (c) Ala Met Lys Ser Asp Thr Asp Leu Ser (c) Arg 
pmol PTH 90 131 175 98 107 39 66 23 34 34 39 34 


Fig. 2. Proximal amino acid data of TGEV E1 polypeptide. (a) Residues identified by partial N- 
terminal microsequencing of the virion protein are aligned over the predicted sequence of the E1 
precursor. The uncharged region of the putative signa] peptide is boxed. The arrow indicates the 
proposed site of entry of E1 into the lipid membrane. (b) Quantity of PTH derivative measured for the 
first 14 residues. Symbols: c, Cys suspected as no PTH derivative was detected; u, residue 
undetermined; @, as in Fig. 1. 
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DISCUSSION 


A cDNA cloned from TGEV genomic RNA was sequenced in the region corresponding to 
the coding part of mRNA 5 (Fig. 1). A single long ORF was found, which is shown to direct the 
synthesis of a coronavirus matrix-like protein. Only one additional ORF of more than 20 amino 
acids was detected, which was 43 residues long and within the El gene (position 240). Partial 
microsequencing of the virion-associated El polypeptide unambiguously established that the 
first residue was the Arg specified by the CGC codon at position 172 in Fig. 1. Hence it can be 
deduced that the single ATG codon available between the CGC codon and the upstream 
consensus sequence CTAAAC should be the functional initiation codon of mRNA 5. It predicts 
a primary translation product of 262 amino acids with mol. wt. 29-6 K, slightly higher than the 
M, 25K reported for the in vitro translation product of mRNA 5 (Jacobs et al., 1986; referred to 
as mRNA 6 in their study). 

The first 17 residues predicted from the nucleotide sequence were found to be lacking in the 
mature protein (Fig. 2). This oligopeptide actually fulfils the criteria of an eukaryotic signal 
peptide, namely a net charge immediately after the N-terminus and high degree of hydropho- 
bicity of the 14 residue long uncharged region (McGeoch, 1985; Von Heijne, 1986). A striking 
feature is the presence of a triple Ala~Cys repeat, which also occurs in the signal peptide of rat 
‘oxytocin (see McGeoch, 1985). Also, the cleavage site appeared to be located between Gly and 
Arg, as predicted by the 75 to 80% accurate weight-matrix approach of Von Heijne (1986). 

These results indicate that TGEV E1 matures through the removal of a 17 amino acid leader 
peptide. Accordingly, the final product is 245 amino acids long, and has a predicted mol. wt. of 
27780 in the unglycosylated form; it is basic (with five net charges at neutral pH) and 44% of the 
residues are hydrophobic. This finding is in contrast with that reported for the matrix proteins of 
two other coronaviruses, MHV and IBV (Rottier et a/., 1986; Armstrong et al., 1984; Stern & 
Sefton, 1982a; Boursnell et al., 1984). It has been proposed that the matrix proteins of the latter 
are inserted into the membrane by the recognition of an internal transmembrane region as a 
signal sequence (Rottier et al., 1985). Incidentally, in the IBV E] sequence (Boursnell et al., 
1984), the 22 in-frame codons between the consensus sequence and the initiation codon predict 
numerous hydrophobic residues, which might be the remnant of an ancestral signal peptide. 

Pairwise comparisons of the gene sequence of TGEV E1 with those of MHV and IBV at the 
DNA level revealed no significant homology. The amino acid sequences showed, in contrast, a 
remarkable homology. The homologies found by Dayoff’s optimal alignment are 38% (TGEV— 
MHV), 30% (MHV-IBV) and 27% (TGEV-IBV). The main regions of homology are shown in 
Fig. 3. An eight amino acid section is perfectly conserved among the three viruses (residues 128 
to 135, TGEV). Of the three potential membrane-spanning regions (thickly underlined), the 
second is well conserved within the MHV-IBV pair, and the third within the TGEV-MHV 
pair; only the first shows a nearly equal degree of homology within both pairs. This might be 
indicative of functional differentiation between the three segments. 

The above findings suggested that the topology of TGEV E1 within the membrane might be 
essentially similar to that proposed for MHV and IBV (Armstrong et a/., 1984; Boursnell et al., 
1984; Rottier et a/., 1986). This is supported by the data presented in Fig. 4, where each profile 
corresponds to a computer-assisted prediction of the local tendency of the TGEV El 
polypeptide chain to hydrophilicity (Hopp & Woods, 1981), accessibility (Rose et al., 1985) or 
mobility (Karplus & Schulz, 1985). By combining our data with those cited above, five regions 
can be delineated from the amino to the carboxy end: a signal peptide (— 17 to — 1), an exposed 
glycosylated segment (1 to 29), three lipid bilayer-incorporated segments (30 to 55, 66 to 86, 98 to 
117), an amphiphilic C-terminal half supposedly associated to the cytoplasmic face of the 
membrane (118 to 228) and a protruding C terminus. 

Unlike the MHV and BCV matrix proteins, glycosylation of TGEV E1 has been reported to 
be of the N-linked type (Garwes et al., 1984; Jacobs et al., 1986), as for IBV (Stern & Sefton, 
19825). Indeed, two potential N-glycosylation sites are available near the N terminus of the 
sequence (Fig. 2). The accessibility of the second site (Asn, 38) in the lumen of the endoplasmic 
reticulum is uncertain since the first putative membrane-spanning segment of TGEV E1 could 
extend virtually up to position 30, i.e. seven residues farther than the WNFS sequence, where 


TGEV E] protein sequence 
10 20 30 40 
TGEV :RYCAMKSDTDLSCRNSTASDCESCFNGGDLIWHLAN[WNF SlWSIT 
ae aes 
(245) MUU: MSSTTOQAPEPVYQWTADEAWQOFL X ElWNF SIL 
eS ee ee oe ¢ 
(228) IBV: MPNETNCTLDFEQSIVQLFKE|YNLFI 
(225) . e 
50 60 
45 Liv YGRPOQFSWFYV 
* no 
31 remem a Ee FGYTSRS 
es ee rary 
26 TAFILLFLTIILOYGYATRS 
ee OO ee NO 
85 F 
° 
71 F 
66 A 
40 150 
125 LCVSALGRS/[YVEPILEGVPTGVTLTL 
e of e e e 
111 MCIDMKGTVIYVR IEDYHTLTATI 
ee ° 
106 GSILLTINGQ@QCNFAIESVPMVLSPI 
. 
200 
165 PSRTIVYTLVG 
. e 
151 VSHLCTYKRAF 
° 
146 PDRRNIYRMVQ 
ery rs ° 
240 
205 KKLKASSATGWAY AGDYSTEARTDNLSEQEKLLEMY 
194 LDKVDGVSGFAV VGNYRLPSNKPSGADTALLRI 
® ° 
186 KYTGDQSGNKKRPFATFVYAKQSVDTGELESVATGGSSLYT 
° 


Fig. 3. Comparison of the amino acid sequences (single letter code) of the El polypeptides of TGEV, 
MHV (strain AS9: Armstrong et a/., 1984) and [BV E1 (Beaudette strain: Boursnell et al., 1984). Dots 
denote a match between two residues; matches between the TGEV and the IBV sequences are 
indicated beneath the latter. With the simple alignment used, the homology within each E] pair is 23% 
(TGEV-MHV), 26% (MHV-IBV) and 15% (f1GEV-IBV). Boxed regions show homologies 260% 
between two or three sequences. Potential glycosylation sites are underlined. Thick bars indicate the 
three putative membrane-spanning segments (see text). 
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Fig. 4. Graphical oatput of three prediction methods applied to the amino acid sequence of TGEV E! 
precursor polypeptide. (a) Hydrophilicity profile with a running average taken over a hexapeptide. (6) 
Expasure profile averaged as in (a); the straight line represents the mean area exposed calculated for the 
whole E! chain. (c) Flexibility profile, with calculations made on a window of seven residues. 
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the MHV spanning segment is assumed to end (see Fig. 3). In contrast the side chain of Asn at 
residue 15 may be linked to an oligosaccharide residue, as suggested by the disturbance observed 
at this point during the N-terminal sequencing (see Fig. 2). The supposition that only the first N- 
glycosylation site is functional leads to an estimated mol. wt. of 29-5K to 30K, a value close to the 
M, value of the E1 major species determined by electrophoresis (since the M, of a carbohydrate- 
rich mannose chain is about 2K.; Klenk & Rott, 1980). Minor E] species consistently observed 
as bands migrating more slowly in SDS-PAGE (Garwes & Pocock, 1975; Hu et al., 1984; Laude 
et al., 1986; Jacobs et al., 1986) might refiect a heterogeneity in the oligosaccharide chain rather 
than in the polypeptide chain (for example an oversized E1 polypeptide produced from mRNA 
4, only 10% larger than mRNA 5). This viewpoint is supported by the fact that TGEV El 
yielded a single band after endoglycosidase H treatment (Huet al., 1984; B. Delmas & H. Laude, 
unpublished results). 

To sum up, the El protein of TGEV shares many structural features with those of MHV and 
IBV. It is becoming clear, however, that a certain diversity may exist despite the constraints that 
are necessary to achieve the distinctive architecture of a coronavirus particle. This is true at least 
for the small hydrophilic region protruding out of the particle, to which no biological function 
has been assigned so far. Our results provide substantial evidence that TGEV E1 undergoes N- 
terminal processing and that its exposed NH, extremity may be significantly larger and possibly 
more complex in secondary structure than those of IBV and MHV. Protease digestion has been 
shown to remove an external glycopeptide of nine residues (IBV; Cavanagh er al., 1986) and 
2:5K (MHV: Rottier et a/., 1984). In comparison, the TGEV E] external segment may approach 
30 residues, including four cysteines (see Fig. 2a and 3). Recent investigations have suggested an 
involvement of TGEV E1 in the induction of interferon « from non-immune lymphocytes (see 
Rasschaert et a/., 1987; B. Charley & H. Laude, unpublished observation). An interesting 
possibility is raised that this previously unrecognized activity of El might proceed through the 
interaction of its NH, free tail with the lymphocyte surface. Additional experiments are 
currently underway in order to study this question. 


We thank J. Gelfi for her excellent technical assistance and J. C. Pernollet (Head of Laboratoire d’Etude des 
Protéines, Versailles, France) for supporting the collaboration. Part of the results were presented at the Third 
International Coronavirus Symposium (Asilomar, September 1986). 
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